Reducing Hallucinations in Medical AI: A Multimodal Architecture with Retrieval-Augmented Validation and Safety-Aware Triage

Authors: Mr. Arrol Dsouza, Mr. Vinit Jain, Mr. Anish Kanojia, Ms. Swati Mahalle

DOI Link: https://doi.org/10.22214/ijraset.2026.80856

Abstract

Due to the increased need for health services in the present era, the resources are stretched to the limit. It is difficult for people to get early advice, identify health problems, and understand health reports. To bridge this gap, this paper proposes a Personalized Medical Intelligence Agent for early healthcare interactions, ensuring safety and ethical responsibility. It receives inputs in text, voice, and image forms, which makes it easier for doctors to understand the problem better. With the help of a medical language model, along with a knowledge framework, it makes it easier for people to understand health reports, identify whether they need to consult a doctor, and how bad their health problems are. It then uses the triage module to escalate the problem if it is critically risky. This method makes it easier for people to receive healthcare services without replacing licensed doctors.

Introduction

The text discusses the growing use of AI in healthcare, especially for medical triage and diagnostic assistance, and highlights the major challenges associated with reliability, safety, and hallucinations in AI-generated medical content.

It begins by explaining that healthcare systems face increasing pressure due to population growth and limited medical resources, leading to delays in diagnosis and treatment. Although telemedicine and AI-based assistants are emerging, they often suffer from issues such as inaccurate outputs, hallucinations, limited multimodal capabilities, and lack of proper validation mechanisms, which can be dangerous in medical contexts.

The paper proposes a multimodal medical AI system designed to address these limitations. It can process text, speech, and medical images, and uses a combination of advanced techniques such as:

Retrieval-Augmented Generation (RAG) for grounding responses in trusted medical knowledge
Self-correction mechanisms to reduce errors and inconsistencies
Evidence-based validation to ensure factual accuracy
Confidence scoring to assess reliability of responses
Risk-aware triage to classify patient severity and guide responses safely

The methodology describes a step-by-step pipeline where user inputs are processed, relevant medical documents are retrieved, and responses are generated by an LLM. These responses are then refined through iterative correction and validated against evidence before being delivered. The system also assigns a risk level (low, medium, high) and adjusts the response accordingly, prioritizing safety in critical cases.

The literature review shows that while AI has improved clinical decision-making and triage efficiency, patient-facing systems still face major challenges, especially in diagnostic accuracy, bias, hallucination, and real-world reliability. This has led to the need for stronger evaluation methods beyond simple accuracy metrics, including safety, factual correctness, and uncertainty estimation.

Conclusion

This study presents an innovative multimodal AI-driven medical triage system utilizing retrieval-augmented generation, self-corrective mechanisms, evidence validation, and risk-considered decision making to minimize hallucinations and ensure dependable responses. The theoretical assessment reveals enhanced diagnostic efficacy, higher recall rates, better comprehension of multimodal inputs, and greater retrieval accuracy compared to baseline RAG models, with significantly fewer unsourced claims. Nevertheless, this enhanced performance comes at the cost of elevated computational delay owing to its multi-phase operation; all All the findings mentioned above are theoretical projections derived from design considerations and research trends, not actual benchmark experiments. Thus, while the proposed system exhibits great promise for reliable medical decision making, empirical testing remains necessary before implementation. Further investigation into diverse medical settings and broader datasets is necessary to confirm the effectiveness and adaptability of these technologies in live environments .

References

[1] R. Arab and O. Moosa, “The role of AI in emergency department triage: An integrative systematic review,” Intensive & Critical Care Nursing, vol. 89, 2025. [2] B. Porto, “Improving triage performance in emergency departments using machine learning and natural language processing: A systematic review,” BMC Emergency Medicine, vol. 24, 2024. [3] A. Abdalhalim et al., “Clinical Impact of Artificial Intelligence-Based Triage Systems in Emergency Departments: A Systematic Review,” Cureus, vol. 17, 2025. [4] A. Da’costa et al., “AI-driven triage in emergency departments: A review of benefits, challenges, and future directions,” International Journal of Medical Informatics, vol. 197, 2025. [5] W. Wallace et al., “The diagnostic and triage accuracy of digital and online symptom checker tools: a systematic review,” NPJ Digital Medicine, vol. 5, 2021. [6] E. Riboli-Sasco et al., “Triage and Diagnostic Accuracy of Online Symptom Checkers: Systematic Review,” Journal of Medical Internet Research, vol. 25, 2022. [7] M. Schmieding et al., “Triage Accuracy of Symptom Checker Apps: 5-Year Follow-up Evaluation,” Journal of Medical Internet Research, vol. 24, 2021. [8] F. Chan et al., “Performance of a new symptom checker in patient triage: Canadian cohort study,” PLoS ONE, vol. 16, 2021. [9] H. Fraser et al., “Comparison of Diagnostic and Triage Accuracy of Symptom Checkers, ChatGPT, and Physicians,” JMIR mHealth and uHealth, vol. 11, 2023. [10] S. Razzaki et al., “A comparative study of artificial intelligence and human doctors for triage and diagnosis,” arXiv preprint arXiv:1806.10698, 2018. [11] L. Komi et al., “Advances in AI-Augmented Patient Triage and Referral Systems,” International Journal of Advanced Multidisciplinary Research and Studies, 2024. [12] S. Islam et al., “Artificial intelligence-based risk assessment tools for health: a systematic review,” BMC Medical Informatics and Decision Making, vol. 25, 2025. [13] K. Gottliebsen and G. Petersson, “Limited evidence of benefits of patient-operated triage tools,” BMJ Health & Care Informatics, vol. 27, 2020. [14] A. Pairon et al., “Usefulness of online symptom checkers: A scoping review,” Frontiers in Medicine, vol. 9, 2023. [15] A. Nord-Bronzyk et al., “Assessing Risk in Implementing AI Triage Tools,” Asian Bioethics Review, vol. 17, 2025. [16] S. Hicks et al., “On evaluation metrics for medical AI,” Scientific Reports, 2021. [17] M. Klontzas et al., “Common performance metrics in AI-practice recommendations,” European Radiology, 2025. [18] B. Van Calster et al., “Performance evaluation of predictive AI models,” arXiv, 2024. [19] F. Oettl et al., “How to evaluate AI in clinical research,” Journal of Experimental Orthopaedics, 2024. [20] E. Asgari et al., “Assessing clinical safety and hallucination rates of LLMs,” NPJ Digital Medicine, vol. 8, 2025. [21] M. Chelli et al., “Hallucination Rates and Reference Accuracy of ChatGPT,” Journal of Medical Internet Research, vol. 26, 2024. [22] K. Singhal et al., “Large language models encode clinical knowledge,” Nature, vol. 620, 2022. [23] M. Azeez et al., “Truth, Trust, and Trouble: Medical AI on the Edge,” arXiv, 2025. [24] F. Cabitza, “Calibration-informed metrics for predictive reliability,” Artificial Intelligence in Medicine, 2026. [25] I. Kopanichuk et al., “How to Evaluate Medical AI,” arXiv, 2025. [26] D. Schwabe et al., “METRIC framework for data quality in medical AI,” NPJ Digital Medicine, 2024. [27] V. Vallevik et al., “Quality assessment of synthetic healthcare data,” International Journal of Medical Informatics, 2024. [28] J. Li et al., “Factuality hallucination in large language models,” arXiv, 2024. [29] J. Li et al., “HaluEval: A hallucination evaluation benchmark,” arXiv, 2023. [30] S. Farquhar et al., “Detecting hallucinations using semantic entropy,” Nature, 2024. [31] E. Fadeeva et al., “Token-level uncertainty for fact-checking LLMs,” arXiv, 2024. [32] L. Huang et al., “Survey on hallucination in large language models,” ACM Transactions on Information Systems, 2023. [33] Y. Chen et al., “Hallucination detection in LLMs,” CIKM Proceedings, 2023. [34] A. Mishra et al., “Fine-grained hallucination detection,” arXiv, 2024.[35] Z. Bai et al., “Hallucination in multimodal LLMs: A survey,” arXiv, 2024.

Copyright

Copyright © 2026 Mr. Arrol Dsouza, Mr. Vinit Jain, Mr. Anish Kanojia, Ms. Swati Mahalle. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET80856

Publish Date : 2026-04-23

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here